Likelihood ratios and Bayesian inference for 

Poisson channels 



ON 

o 
o 

(N 
X> 

IX, 



> 



(N 

o 
o 



X 



Anthony Reveillac 
Institut fur Mathematik 
Humboldt-Universitat zu Berlin 
Unter den Linden 6 
10099 Berlin 
Germany 



Abstract — In recent years, infinite-dimensional methods 
have been introduced for the Gaussian channels estimation. 
The aim of this paper is to study the application of similar 
methods to Poisson channels. In particular we compute 
the Bayesian estimator of a Poisson channel using the 
likelihood ratio and the discrete Malliavin gradient. This 
algorithm is suitable for numerical implementation via the 
Monte-Carlo scheme. As an application we provide an new 
proof of the formula obtained recently in [5] relating some 
derivatives of the input-output mutual information of a 
time-continuous Poisson channel and the conditional mean 
estimator of the input. These results are then extended to 
mixed Gaussian-Poisson channels. 

Index Terms — Poisson process, Bayesian estimation, 
Malliavin calculus, mutual information, extended De 
Bruijn identities. 
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I. Introduction 



Recently in 11131 . infinite-dimensional methods 
have been used to derive a new expression of the 
conditional mean estimator for infinite-dimensional 
additive Gaussian channels. More precisely the 
conditional mean estimator is obtained as the 
Malliavin derivative of the logarithm of the 
likelihood ratio. In [fl3l this relation is used to 
show that the derivative of the input-output mutual 
information with respect to the signal-to-noise 
ratio of an additive Gaussian channel can be 
expressed in terms of the risk of the conditional 
mean estimator of the input. This fundamental 
connection has been first established in using 
a different approach. In addition, the counterpart 
for Poisson channels of this connection has been 
obtained very recently by Guo, Shamai and Verdu 



in [0 and 0. The aim of this paper is two-fold: 
first we prove that for a general Poisson channel, 
the conditional mean estimator can be obtained as 
the discrete Malliavin gradient of the likelihood 
ratio. Then, as an application, we present a new 
proof of the connection mentioned above obtained 
in and 0. Note that as an intermediate result, 
we provide extended de Bruijn identities (in the 
sense of [fT3l Section VI]). Let us make more 
precise the statements mentioned previously. 

In the general framework of additive Gaussian 
channel, an observed signal Y is decomposed into 
the sum of an input signal X plus an independent 
Gaussian noise w as 



Y = pX 



w. 



(1.1) 



where p is the "signal to noise ratio". In this 
context the signals "lie" in an abstract Wiener space 
(W, H, p w ) where W is a separable Banach space, 
H is an Hilbert space densely and continuously 
embedded in W and pw is a Gaussian measure 
on W . In particular the input (resp. the output) X 
(resp. Y) is an //-valued (resp. W^-valued) random 
variable. This setting contains the case of an ob- 
served continuous-time stochastic process (Y t )t£\o,T] 
(with values into the space of continuous functions 
W := C([0,T])) related to an input stochastic 
process (X t ) te [ 0iT ] (with values into the Hilbert 
space H := L 2 ([0,T])) by the following stochastic 
differential equation, 

dY t = pX t dt + dW t , t e [0, T] (1.2) 

where (W t )te[o,T] is a real valued standard Brownian 
motion independent of (X t ) te [ 0iT j and p denotes the 
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"signal to noise ratio". In |fT3l Prop 4.1], it is shown 
that 

E[X\y] = ~Vlogl(Y), (1.3) 

where y denotes the sigma field generated by Y, V 
denotes the Malliavin gradient which is a infinite- 
dimensional counterpart of the usual derivative on 
M 71 and / is the likelihood ratio associated to model 
CD that is, 



dfxw 



Relation (11.31) entails the following result ( |fT3l 
Proposition 5.1]), 

dI(X; Y) 



dp 



E[pf-E[X|y]||y , (1.4) 



where I(X; Y) denotes the mutual information be- 
tween X and Y, defined as, 

dpx.y 



I(X;Y):- 



l0 § M 
HxW "l/^X x fJ-Y 



-Hx,Y(dx,dy). 



This relation had been previous obtained in H for 
time-continuous Gaussian channels using different 
techniques. Regarding these results one can ask 
the following question: can we find counterparts of 
relations (|L3l) and (1141 ) in a non-Gaussian setting? 
An answer has been recently given in [3] and JH 
for the Poisson regime. Let Y = (Y t )te\o,T] be a 
Poisson process on [0, T] with intensity measure 



(X; 



t)te[o,T] 



j A + aX s ds ) where a, A > 



te[o,T] 



and X is a positive stochastic process. Then it is 
shown in [5, Theorems 3-4] that 



d_ 

da 



I(X;Y) 



a 



MaX s + A) - Mn^X s \y])u(ds) (1.5) 



and 



\og(aX s + A) - \og(B[aX s \y])u(ds) 



Us 



.(1.6) 



In this paper, we first extend Zakai's results 
to Poisson channels (see Proposition IIV.4I and 
Corollary IIV.6I ). Then as an application, we provide 
a new proof of relations (11.51 ) and (|L6l) in Theorem 
IV. 3 [ As an intermediate result we also state and 
prove in Proposition IV. 2 1 extended De Bruijn 



identities analogous to |fT3l Relation (35)]. 

We proceed as follows. First in Section [II] we 
extend Relation (|L3l) to the setting of classical 
Poisson channels. Secondly, we will use infinite- 
dimensional stochastic analysis methods presented 
in Section [III] to derive in Section [IV] an equivalent 
of (11.31) for infinite-dimensional Poisson channels 
using a Malliavin gradient for Poisson processes. 
This relation will be used in Section |V] in order 
to give an new proof of (1131) and (|T6l) for general 
Poisson channels. Then in Section IVI-AI we 
generalize the results obtained in Section [IV] to 
a class of normal martingales which contains the 
continuous time Poisson channel, the Gaussian 
one and a mixture of the both (this class includes 
some martingales with jumps and non-independent 
increments). Finally in Section IVI-BI we extend 
relations (11.51) and (11.61) to a deterministic mixture 
of Gaussian and Poisson channels. We remark that 
we were not able to show relations of the type (11.51 ) 
and (lL6l) for the processes with non-independent 
increments presented in Section IVI-AI Example 
2). This phenomenom was suggested in the last 
sequence of |]5] Section VI] where it has been 
remarked that both Gaussian and Poisson processes 
share the independent increments property. 

II. Poisson channel on N 

Let us briefly describe the Poisson channel on N 
(see 0T) for a survey on Poisson channels). 
Poisson channels are different from Gaussian chan- 
nels in the sense that the observed signal cannot 
be expressed as the sum of the input signal plus 
some additional noise, it cannot be expressed in 
an "additive" way like in dl.lt . Consider a positive 
input signal X with distribution fix- We assume the 
output Y is a Poisson random variable on N with 
intensity aX + A, 

Y ~ V(aX + A). 

This setting is used for example in photo-detection 
problems where a photo-sensitive device (e.g. a p- 
i-n diode) is modeled by a Poisson channel. In this 
setting A is a residual current in the device called the 
"dark current noise" and a is some scale parameter. 
Note that contrary to the Gaussian channel A and 
a cannot be replaced by a single coefficient, the 
"signal to noise ratio". 
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Let /i be the distribution of a Poisson random vari- 
able on N with intensity 1. Finally assume that the 
conditional law fx Y \x(-\x) is absolutely continuous 
with respect to (Iq (this condition implies that the 
joint distribution of (X, Y) is absolutely continuous 
with respect to measure fx x X /x ) whose density is 
given by 



dfiy\ 



X=x 



dfi 



(y) 



= exp(— ((A — 1) + ax)) (A + ax) y , 

for x G K+, y G N, and the law of K is absolutely 
continuous with respect to /x with density m, 



m(y) :-- 



{y)Hx{dx), yen. (Ill) 



Now we can state the following lemma which will 
be extended in Section [IV] as Proposition IIV.4I and 
Corollary HV6l 

Lemma ILL The Bayesian estimator of A + aX 
can be expressed as: 

E[i|^ m(r + 1) ,: m(r) -^. <n.2) 



am(Y) 
Proof: Let y in N. 

m(y + 1) — m(y) 



a 



+oo 



x)(A — 1 + atx)nx(dx) 



(*) 



m(y) / (A 



1 nx) y (x) fix{dx) 



dfi 



x 



+oo 



m(y) I A — 1 + a / xfXx\Y= y (dx) 



m{y){\ 



aK[X\Y = y\) 



Equality (*) is justified by a relation of the form 
Hi) of Proposition HV.ll ■ 

Remark II.2. The nonlinear filter of X given in 
( 177.21) can be numerically approximated thanks to a 
Monte-Carlo scheme (see Remark \IV.7\) . 

Remark IL3. The conditional distributions used in 
Lemma 177. 1 1 are well defined in this context, one can 
refer to Propositions \IV.1\ and \IV.2\ for more details. 

To obtain results for more general Poisson channels 
we have first to recall some elements of analysis on 
the Poisson space. 



III. Analysis on the Poisson space 

In this Section we introduce some elements of 
analysis on the Poisson space in a general frame- 
work. We will then describe these elements using a 
concrete example. 

Let (S,B(S),u) a measure space where v is an 
intensity measure that is atomless and a-finite. For 
an element z in S we denote by S z the Dirac- 
measure at point z on (S, B(S)). Define the Poisson 
space £ls as 



fie 



y 



^5 Zk , n € N, z k G S, 1 < k < n 



k=l 



with N := N U {oo} and for y = Y2=i $z k > let 

C{y) :={ Zl ,...,z n }. QUA) 

Define the canonical process (N A ) A( z S ( S ) on Q s as 

N A (y) :=y(A), t/Gfi Sl 

where y(A) is the number of atoms of y in the set A. 
We define the a-field JF 5 on Vis with Ts = &({y | — ► 
y(B), B G B{S)}). 

There exists a probability measure P5 on (fig, ^5) 
called the Poisson measure such that, 
. V5 G B{S), Vn G N, 

F s ({y I y(B) = n}) = exp(-u(B)) 

• For disjoint subsets (Bi, ■ ■ • , B n ) in B(S), 
y(Bi), . . . ,y(B n ) are P s -independent. 

Under P5 the canonical process (Na)agb(s) is a 
Poisson process with intensity v. 
Define AA[S) as the set of non-negative measure on 
{S, B(S)). Let H s be the space 

H s = |w G M(S), 3h G L 2 + (S, du), 

.duo , 
uj « v with — — = h 

dv 

where L+(S, dv) denotes the set of positive function 

of L 2 (S, dv). 

H s is equipped with an inner product (•, -) Hs given 
by 

(oj 1 ,oj 2 )h s — (hi,h 2 ) L 2^ s ,du), G H s , lu 2 G H s . 

Note also that we will denote by E the expectation 
with respect to the measure fi x x fi Y , ^1 the 
expectation with respect to fi Y and E the 
expectation relative to Pg. 
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(S,B(S),u) = {[0,T),B{[0,T)),d7r) with tt being 

The Malliavin operator V we introduce will be the Lebesgue measure on [0,T] and 
of interest in Sections [IV] and [V] 

Let L (Q S , Fs,^s) be the space of measurable n [o,n 

mappings from JF S , P 5 ) to R. Define first the f ,A 1 

operator L> by, ■= <V = n G N, < ^ < ... < t n < T . 

fc=i 

v 7 v w ' 7 In this case C(y) given by dIII.lt is the set of 

F i ^ D z F(y) := F(y + 8 Z ) — F(y). the j ump times of the path y and under 

Technical justifications about the measurability of ( N [o,t])t_[o,T] is a Poisson process with intensity dt, 

the previous map can be found in HI and refer- tnat 1S > the stochastic process (N t - t) te{0>T] is a 

ences therein. We mention the following chain rule P[o,t] -martingale. 

property for the Malliavin derivative D, that is, for In this case H V,t] can be defined in a morc tractable 

every random variables F and G on (Q s , Fs^s) way by ' 
we have that 

D 8 (FG) = FD S G + GD S F + D S FD S G, s e S. 

(IE.2) 7r wi h — — h 

We also introduce the operator F and the Malliavin w << 7r wi dn ~ 
integration by parts formula which will play an im- 
port role in Section [VJ For a deterministic function equipped with 

h : S — > R we denote by h(h) the stochastic / \ /, , \ ^ u 

i t %. ■ 1- 1 • {^i,^2)h 10T] := {hi,h 2 )L^([o,T]), u u u_ e #[o,ti- 

integral of n, against the martingale y — v, i.e., 11 u Jy 1 J 

Finally in the case of the classical Poisson space 



F [0jT ]= { W :[0,THl,3/ l eI 2 ([0 1 T]), 



/■ finally in tne case oi tne classical roisson space 

h(h) := / h(s)(dy s -u(ds)). 

the Malliavin derivative V can be expressed in a 

J S _1 • rr ________ _C _ _ TT? / "\ TTTi T — 7 T~T * _ 7T 

the stochastic integral is d 
(in the sense of Stieljes) as follows 



different way, for F : Oro ri — > M, VF is a iirorr 
Note that the stochastic integral is defined pathwise valued random variable and 



r " V m F:= [ D s F7r(ds),te[0,T\. 

h(s)dy 8 = 2_.h(z k ), a.s Jo 

Js 



k=l 



„ (i IV. Conditional mean estimators for 

where y(£) = n and y = £ fc=1 poisSQN CHANN£LS 

Let h as above and let F be a random variable on A. Some general facts about the Bayesian frame- 
(fi 5 ,.F s ,P s ) we have that work 

r r l We introduce in this Section the Bayesian 

B [FI l (h)}=__ / D s Fh{s)v{ds) . (III.3) framework and compute in Section ITV^Bl the 

s conditional mean estimator in the setting of 

Note that the proof of this formula is done for Poisson point process, 
example in [JS). We define the operator V which 

is an "integrated" version of D. Let X be an input signal with values in a 

Definition III.l. For F : {l s -> R we define VF s P ace ( H ^ a ( H )) with distribution fi x . Consider 

as the H s -valued random variable ( n > ^ P ) a probability space and assume the output 

Y lies in Q. We make the following assumptions; 

V A F:= / D z Fu(dz), A e B(S). (HI) For all x in H, fi Y \x=x (the distribution of 

"* A Y given X = x) is absolutely continuous 

We conclude this section by mentioning that the with respect to P and we denote by L the 

setting described above contains the canonical corresponding Radon-Nikodym density. 

Poisson space f2[o,T] as a particular case, where (H2) L is (cr(H) ® J 7 ) -measurable. 
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Then, the following function 

H x T -> [0, 1] 

(x,B) ^ fMY\X=x(B) 

is a transition probability in the sense of [7, Def- 
inition III-2-1 p. 69]. Moreover the joint distri- 
bution ji of (X, Y) is a probability measure on 
(H x fi, <g) J") such that, 

x 73) (IV. 1) 

= / /iy|x=*(£) A x G <t(#) ® T. 

J A 

Denote by M the marginal distribution of p, on 
(fi, JF) defined by, 

M(5) := x B), B E T. (IV.2) 

Proposition IIV.1I is mainly devoted to show the 
existence of the following transition probability 

x a(H) -> [0, 1] 
(y, A) i-> 

and that the couple (M, (px\Y(-\y)) y en) allows us 
to recover p as 



B. General Poisson channels 

Let (S,B(S),v) and tf s as in Section HH We 
denote by (f2s, JF^, P s ) the Poisson space intro- 
duced in Section [TIT] and assume that under the 
probability measure F$, the output process Y is a 
Poisson point process with intensity measure v. Let 
in addition A and a be positive numbers. Let X 
be the input random variable with values in H$ 
such that f H J s x z v(dz)px(dx) < oo. Then, by 
Girsanov theorem (see for example |[T0l Theorem 
3.1.1, p. 78]), hy\x('\%) the conditional probability 
on Q given X = x is absolutely continuous with 
respect to P5 and the Girsanov-Radon-Nikodym 
density denoted L is given by 



L(y,x) 



Y\X\ 



X 



rfPc 



(y) 



(IV.5) 



(IV.3) 



exp 

y(S) 



-(A — l)u(S) — a x z v[dz) 



Y[(\ + ax(z k )), 



n{AxB) 



fx x \Y(A\y)M(dy), AxB e a{H)®F. 

(IV.4) 



Proposition IV. 1. If (HI) and (H2) are satisfied 
then 

i) fi is absolutely continuous with respect to 
fix x P and the corresponding Girsanov- 
Radon-Nikodym density is L. 

ii) M is absolutely continuous with respect to P. 
Let m be a version of dMj dP. 

Hi) For M almost all y in Q, nx\Y= y is absolutely 
continuous with respect to fix and for y such 
that m(y) 7^ 0, the Radon-Nikodym density is 
given by, 



k=l 

where y(S) = n and y = Y^k=i&z k - hi other 
words, under the probability measure n Y \x(-\x), 
the stochastic process Y is a Poisson process with 
intensity A + ax. 

Proposition IV.2. Assume that hypotheses (HI) and 
(H2) are in force. Let B A (Y) := B[X(A)\y] for 
A e B(S). Then 

B A (Y) (IV.6) 

x(A) fi X \Y(dx\y), for M — almost every y. 



dfix\Y=y 



X 



L(y,x) 



dfix v " 7 m{y) 
iv) For a (a(H)®T) -measurable function f : Hx 



n 




n jh 




f(x, y)fix\Y= y {dx) M(dy) 
f(x, y)fiY\x=x{dy) fi x {dx). 



Remark IV.3. Note that the expression MV.6\) is 
theoretical and cannot be used in practice. In con- 
tradistinction, relation (\IV.9\) obtained below en- 
ables a numerical approximation of the Bayesian 
estimator as mentioned in Remark \IV.7\ 

In fact it is more tractable to estimate the densities 
rather than the intensity measures. So we denote by 
X the L? (S, du) valued random variable associated 
to X. For z E S (IIV.6I ) can be rewritten as 



H JQ 



Proof: See for example HI Section 4.2.1, 
p. 126], or E Section A.3, p. 623-626]. ■ 
Now we will make use of the general Bayesian 
framework described above. 



B z (y) = E[X 2 |F = y}= x z fi x \ Y {dx\y), M-a.e. 

JH S 

(IV.7) 

We can state the main result of this paper. It allows 
us to express the Bayesian estimator of the input 
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as a discrete logarithmic Malliavin gradient of the 
likelihood ratio m. We recall that 

m(y)= L(y,x)n x (dx), y eQ S - (IV. 8) 
Jh s 

Proposition IV.4. Assume that hypotheses (HI) and 
(H2) are satisfied then for M -almost every y we 
have that 



We conclude this section by a more explicit case, 
that is the classical Poisson process on a time 
interval [0,T] equipped with the Lebesgue measure 
7r. More precisely, let (X t )te[o,T] t> e an input signal 
with values in #[o,t] (see Section ITTll) . The output 
(Yt)te[o,T] is supposed to be a Poisson process with 
intensity A + aX where A and a are some fixed 
parameters. 

The likelihood denoted by L is given by 



a m(y) 



Proof: For y in Q s we set: y(S) = n, y 
Y2=i and C (y) be tne set defined by (Mill) . Let 
z in S, we have that 

D z m(y) 

= m(y + <y - m(y) 

= / L(y,x) l^ C (j/)[(A - 1) + ai«]/ix(rf^)- 

So 

V A m(y) 

D z m(y) v(dz) 



L(y,x) / 1*#(„)[(A- 1) + ai 2 ]^(dz)//x(^) 

H s J A 



(IV.9) L(y, x) = exp ^-(A - 1)T - a ^ x s vr(ds)^ 



L(y,x) / [(A — 1) + ai z ]^((i^)/xx(^), 
as i/ is atomless, 



L(?/,a;)[(A - l)^(A) + ax(A)]/j, x (dx) 
[(A - + ax(A)]m(y) n x]Y (dx\y), 



by of Proposition IIV. 1 1 
By Proposition IIV.21 we have that 

B[X(A)\y] = ^g - ^M , A g 



a m(y) 



Remarks IV.5. • Neither V nor D satisfy the 
chain rule of derivation, and consequently 

V^logF. 

• We have shown in Proposition \IV.4\ that 

(IV. 10) 



?/([0,T]) 



n < a+ 



a a; 



fc=l 



where z k G 

Proposition IIV.4I becomes the following corollary, 

Corollary IV.6. Under assumptions of Proposition 
\IV.4\ we have that 

E[ x tM = «l_(^, te[0 , r] . 

(IV. 11) 



a m{Y) 



a 



Remark IV.7. The nonlinear filter given by equa- 
tions AIVAi and AlV.lli can be numerically approx- 
imated by evaluating m in AIV.8\) by a Monte-Carlo 
scheme. This computation is really tractable since 
the Malliavin derivative V is a difference operator. 

V. Mutual information and conditional 
mean estimation: the Poisson case 

In this section we present the second main result 
of this paper (Theorem IV. 3 1 ), i.e. the use of relation 
(IIV 101 ) to recover (in a different manner) a relation 
between the mutual information of general Poisson 
channels and the conditional mean estimator of the 
input which has been established recently in |[5l 
Theorems 3-4] (see also [3]). We stress that we pro- 
pose a new proof of this result involving Malliavin 
calculus and stochastic analysis arguments related 
to the tools involved in lfT3l to solve the same 
problem for additive Gaussian channels. In addition 
our results are valid for general Poisson channels. 
Finally we provide extended De Bruijn identities 
(in Proposition IV 21) of the form of those obtained 
in lfT3l Section VI]. During this section we assume 
that hypotheses (HI) and (H2) of Section QV] are in 
force. First we state and prove the following lemma. 
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Lemma V.l. For any z in S we have that 

I D TYl 

D z log(m) = log 1 



rn 



log (E[\ + aX s \y\), F-a.s. 



Proof: First we recall that the last equality fol- 
lows from relation (|IV. 1 01) . Then from the definition 
of the Malliavin derivative D we have that 



l D z m(y) _ m(y + 5 z ) 
m(y) m(y) 



leading to 



log 1 



D z m(y) 
m(y) 

= \og{m(y + 5 Z )) - \og{m{y)) 
= log m(y + 8 z ) -log m(y) 
= D z (\ogm){y). 

■ 

In the next Proposition we present extended De 
Bruijn identities which are the counterpart of lfT3l 
Section VI relation (35)]. These relations will be 
necessary in the proof of Theorem IV. 3 1 We intro- 
duce the following condition 



\X s \og{X s )\v{ds) 



< oo 



(V.l) 



which ensures by Jensen's inequality that the 
Bayesian risk defined as 

(X s logpQ - E[X.\y] \og(B[X s \y]))v(ds) 

s 

is finite. 

Proposition V.2 (Extended De Bruijn identities). 

Assume that condition (IV.1\) is satisfied then the 
relations i) and ii) below hold. 

i) 

d 

Ei [log m\ 



da 



1 

— E 

a 

1 

-E 

a 



Us 



ME[X + aX s \y])u(ds) 
( D s m\ , , , 

Va l + y{ds) 

\ ml 



where i/j\(x) := (x — A) log(x) and 



ii) 



dX 



E x [log m) 



E 
E 



us 



log (B[X + aX 8 \y])u(ds 



log 1 



US 



m 



v{ds) 



Proof: 

i) In the following computations we will use the 
integration by parts formula and the relation 

D s L(y, x) — L(y, x)(X—l + ax s ) (V.2) 

which has been obtained in the proof of Propo- 
sition IIV4I We recall that Ei denotes the 
expectation with respect to mdP and that E 
denotes the expectation under P. We follow 
the main lines of the proof of |[T3l Proposition 
5.1] with however significant differences like 
the use of the Malliavin integration by parts 
formula. 



d_ 

da 



Ei [log m] 
d 



— E [m log m] 

da 



E 
E 
E 



logm 



dm 
da 



logm 



H 



+ 
d 

da 



log L(y,x) L(y,x)fj, x (dx) 



logm / L(y,x) 
Jh 

•T ■ 



—x s p(ds) 



+ 



— —dy s )nx{dx) 

X + ax s ' 



H 



+ En 



logmL(y, x) 



logmL(y, x)I\ 



-ax 2 s + (1 — X)Xi 



x 



X + ax 



X + ax s 



u(ds) 



where we recall that under P the stochastic 
process y is a Poisson process with intensity 
v and by definition of the operator l\ we have 
that 



h 



x 



A + ax J J s X + ax 
As a consequence: 



— (dy s - u(ds)) 



^- Ei [logm] 
da 



H 



logmL(y, x) 



-ax 2 s + (1 - X)x t 
X + ax s 



u(ds) 
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X t 



D s (log mL(y, x)) — -v{ds) 

Us * + ax s 



fj,x(dx) 



where the last equality is obtained using the in- 
tegration by parts formula (IIII.3I) . Applying the 
chain rule formula for the Malliavin derivative 
(IIII.2I) we deduce that 



d , 
— Ei log m 

da 



H 

+ E 
+ E 
+ E 



log mL(y, x) 



s 



-ax 2 s + (1 - X)x s 
X + ax s 



v(ds) 



X t 



D s (log m)L(y, x) — — —v(ds) 

J S A ~j~ OiX s 

D s (\ogm)D s L(y,x) s . u(ds) 
Us X + ax s 



X y 



logmD s L(y,x) 
s A + ax s 



u(ds) 



Hx{dx) 



In addition relation (IV. 21) and the preceding 
expression entail that 



d 

da 



E x [log m] 



<H 

+ E 
+ E 
+ E 



log mL(y,x) / — : u(ds) 

s A + ax s 



D s (logm)L(y, x) Xs . 
s A + ax. 



z/(<is) 



D s (\ogm)L(y, x) — : u{ds) 

Us A + ax s 



log mL(y, x) 

(A — 1 + ax s )x s . , . 
x ^ : -JL-L^ds) 



Hx{dx) 



'H 

E 
E 

E 

1 



A + ax s 
D s (logm)L(y,x)x s v(ds) 



fj,x(dx) 



D s (\ogm) / x s L(y,x)/j lX (dx)u(ds) 
s Jh 



D s (\ogm)m I x s fj,x\Y(dx)i , (ds) 
s Jh 



D 9 (logm)B[X a \y] 



E 
E 



Us 



log 1 + 



m 



E[X s \y)v(ds) 



D s m\ ( D„m', 
log 1 + 1 + u(ds) 

Us V m J \ m 



where the last equality comes from Lemma |VT1 
and relation (II VI 0D . 
ii) The proof is similar to the proof of point i). 
However for making this paper self-contained 
we present the main arguments in the following 
computations. 



d 

dX 



Ei [log m] 
d 

— E [m log m\ 
dX 

dm 
~dX 



+ 



d 



logm / — log L(y,x] 



xL(y, x)/ix{dx) 



\ogm I L(y,x)l- u(S) 

+ / : - . dy s )nx{dx 
J s A + ax s J 

—ax s — (A — 1) 



+ E 



log mL(y, x 
log mL(y, x)I 
E 



A + ax 
\ X + ax 



u(ds) 



fix(dx) 



' H 
+ E 



. —ax s — (A — 1) . . 

\ogmL(y,x) / — : v(ds) 

s A + ax s 



D s (log mL(y, x))— -u(ds) 

s A + ax s 



Hx{dx) 



+ E 
+ E 
+ E 



. — ax s — (A — 1) . . 

\ogmL{y,x) / — : i/(ds) 

s A + ax s 



z/((is) 



D s (logm)L(y, x)— - 

X + ax s 

D s (\ogm)L(y, x) X ] + aXs v{ds) 
s A + ax s 



s A + ax s 



E 



D s (logm)L(y,x)u(ds) 



s 



jj,x{dx) 
Hx{dx) 



D s (\ogm)m I fi X \Y(dx)u(ds) 
Us Jh 

log f 1 + ) i/(tZs) 
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Theorem V.3. Assume that condition ( IKil) is satis- 
fied then we have that 

i) 



^a(«X s + A) - i> x (E[aX s \y))v(ds 



log E[A + aX s \y})K[aX s \y}v(ds 



da 


1 

= — E 


[I 


a 




1 

= — E 


IL 


a 





tl)\(aX s + A) - ip\ 



= Ie 

a 

Relations (fV.31 and dV4l) lead to the result 
ii) Similarly we have that 

/ Ei [log L(y,x)} fix(dx) 
'h 

d , 

-TT- / E 



dX 



(A - 1) u(ds) 



where i/i\(x) := (x — A) log(x). 



ii) 

dA 



/(x ; y) 



E 
E 



log(«X s + A) - log(E[aX a |y])i/(da 



log(«X s + A) - log ( 1 

Proof: First we have that 
I{X-Y) 



rn 



v(ds) 



log 



Hxfl 



dfi Y \x{-\x) 



dF 



log 



djJLy 

dfi 



w 



(y) n(dx,dy). 



Ei [logL(y,x)]nx(dx) -Ei[log(m)]. 



H 



da 



d 

da 



i) We have that 
— J Ei [log L(y,x)} fix (dx) 

— / (A — 1) + ax s u(ds)iJ,x(dx) 
L Js 

+ / log(A + ax s )dy, 
Js 

-(A — 1) — ax s 
+ log(A + ax s )(\ + ax s )v(ds)nx{.dx) 
x s log(A + ax s )i , (ds)fix(dx) 

x s log(A + ax s )h , {ds)iix{dx) . (V.3) 




H JS 




H JS 



US 



By Proposition IV. 21 i) it holds that 



d_ 

da 



dX 



d 

dX 



H 



— \ (A — 1) + ax s v{ds)\ix{dx) 
Js 

log(A + ax s )dy s 




H JS 




(A — 1) — ax s 
+ log(A + ax s )(X + ax s )u(ds)nx(dx) 
log(A + ax s )p{ds)iix{.dx) 



H JS 



log(A + ax s )p{ds) 



Us 



(V.5) 



By Proposition IV. 21 ii) we have that 
-^-Ei[logm] 



a. 



log [B[X + aX 8 \y])u(ds 



We conclude from relations (|V5I) and (IV6I) . 



vi. a generalization to a class of 
non-Gaussian and non-Poisson channels 

A. The conditional mean estimator formula 

In this Section we give a generalization of results 
from Sections [IV] and |V] We use some notations and 
definitions presented in Section IVIII 
Let Y := (Y t ) te [ 0:T ] a normal martingale on a 
probability space (Q, J- ', P) with a right continuous 
filtration (T^te^T] that is: 
. B[Y t 2 } < oo, t G [0,T], 
. E[F t |jF s ] = Y s , < s < t < T and 
. B[(Y t - Y S ) 2 \T S \ = t-s, < s < t < T. 
In addition we assume that there exists a predictable 
function := ((f>t)te[o,T] sucn that the stochastic 
process 

-t 



Y t -t 



dY s , t e [0, T] 



(VI.1) 



Ei [log m] 



is a martingale. Finally, we assume that Y that has 
the chaos representation property (see Definition 
( V - 4 fiaL2l). We present two examples of such processes. 
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Example 1) Assume <p '■= (0t)te[o,T] appearing in the Lemma VI. 1. With notations of Definition WII.H we 

structure equation dVI.lt is deterministic. Then have, 
{Yt)te[o,T] has the chaos representation property oo 

see ED, and (Y t ) te[0tT] can be represented as My, x) = -r4.((A - 1 + ax)® n ), 



n! 

n=0 



dY t = i t dB t +<f> t (dN t -p t dt), Y o = 0, te[0,T] t 

where (A — 1 + ax) 59 ™ : [0, T\ n — > K zs defined as 

where (S t ) te r 0) Ti is a standard Brownian mo- n 

tion, i« = l Wt=0 }, Jt = 1 — and (iVt) te[0 , T ] (A - 1 + . . . ,t n ) = JJ(A - 1 + ax tk ). 

is a Poisson process independent of (B t ) t e[o,T] fc=i 

with intensity t i-> / p s ds with p s : = 4. ^o/: See E Section 3.5, p. 87]. ■ 

io This formulation of L and the definition (IVII.lt 

" of the Malliavin derivative in this context give 
- for = 1, (Y t )te[oT] is a Poisson process 

ft l D t L(j/, x) = (A - 1 + ax t )L(y, x), t G [0, T]. 

with intensity v t = \ — ds; (VI.2) 



o 



- for = 0, (y t )*6[o,n is a standard Brown- Definition V 1.2. For t in [0, T] tfe/zne V t as 
ian motion. 



Example 2) Consider t = /3 G [-2,0). Then VtF = f D S F ds, F e L 2 (Vl) satisfying <^TT2^. 

{Yt)te\o,T\ is an Azema martingale. This process Jo 

has the chaos decomposition property but its By using the gen eral Bayesian results presented 

increments are not independent contrary to the in Section [IV] we have the following Proposition, 
previous example. 

In this Section we assume that assumptions of the Proposition VI.3. E[A t |^J = 7777—, t G [0, TJ. 



Subsection irV-AI are in force and we recall that we 



m(Y) 



denote by E the expectation with respect to the Proof: One can mimic the proof of Proposition 

measure /i x x Hy, Ei the expectation with respect | IV-2|b y noticing that the key ingredient is formula 
to fly and E the expectation relative to P. dVL2]). ■ 



Let A and a two positive numbers. Let 
(X t )te[o,T] a real-valued input process with 

X t — J X s ds, t G [0, T). Assume the output 

signal (V t ) tg [ 0T ] is a normal martingale such that 
the measure (j, Y \x( m \%) is absolutely continuous 
with respect to P with likelihood given by 

L(y,x) 



dfi Y \x(-\x) 




x JJ(A + ax s 0(s))e~ (A - 1+ais ^ w . 

s<T 

We refer to [9, Theorem 37, p. 84] for technical 
justifications about the existence of L. 



B. Mutual information and conditional mean esti- 
mation 

In this section we consider a particular example 
of mixtures of Gaussian-Poisson presented in of 
Section [Vj] Let (4>t)\o,T] be a deterministic function 
with values in {0, 1} and let (Y t ) te m t T] be the 
martingale defined on a probability space (fi, T , P) 
by 

dY t = l^odBt + <j) t (dN t - n(dt)), t G [0, T] 

where B and N denote respectively a standard 
Brownian motion and an independent Poisson pro- 
cess with intensity the Lebesgue measure on [0, T] 
denoted by ir. This model is really an "hand-made" 
example of a mixture between Gaussian and Poisson 
regimes. Actually can be thought as a "switch" 
enabling a user to pass from the Gaussian regime 
((p s = 0) to the Poisson one (<j) s = 1). In addition 
please note that we assume no restrictions on the 
number of switches from one state to another. The 
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next Lemma and Theorem are the main result of 
this section. 



Lemma VI.4. Assume that 

-T 



X 2 s l^ =0 + \X s log(X s )\l^ =1 ds 



then the following relations hold 
i) 

d _, r . , 
— Ei log m 
da 



< oo 
(VI.3) 



'H 

-E 
-Eq 



x s D s (\og mL(y,x)) 
o 1 + (A - 1 + ax s )cj) 



ds 



, , / , f (A - 1 + ax s )x s 4> s 

\ogmL{y,x) / — — — — — ds 

J 1 + (A - 1 + ax s )(f) s 

logmL(y,x) / (A - 1 + ax s )x s l^ s=0 ds fi x (d. 
Jo 



X) 



a 



E[(A - 1 + aX s )\y] B[X.\y]l^ds 
logE[A + aX s \y] B[aX s \y}l^ a=1 ds 



ii) 



— Ei [log m] 



T 



d 

da 



B[(\-l + aX s )\y}l^ s=0 ds 



\ogB[\ + aX s \y}l (f>3=1 ds 

Proof: We only present the proof of i): the one 
of ii) being very similar. We have that 

Ei [log m] 

log mL(y, x)I\ 



We need a chain rule formula for the Malliavin 
derivative D which can be found for example in 

m. 

D s {logmL{y,x)) (VI.4) 
= L(y,x)D s (logm) + log mD s L(y,x) 
+(fi s D s (logm)D s (L(y,x)). 

Combining (IVI.4I) and (IVI.2I) we obtain that 

Ei [log m\ 

da 

T T , s x s D s (\ogm) 

L(y, x) 7T —ds 

„ iy ' ; 1 + (A - 1 + ax s )<f) 

T x s D s (\ogm)D s L(y,x 



H 



log mL(y, x) 



x 

1 + (A - 1 + ax) 
T (A - 1 + ax s )x s (f) s 



ds 



o 1 + (A - 1 + ax s 

\ogmL(y, x) (A - 1 + ax s )x s l (j)a=0 ds 
Jo 



fix(dx) 



Note that in this situation 



h 



x 



1 + (A - 1 + ax) 

T 

x s 

,, 1 + (A - 1 + ax s 

T 

X a 



-dy s 



n 

+ E 
+ E 

-E 
-E 

r 

H 

+ E 

+ E 
-E 



1 + (A — 1 + ax s 



-ds 



T 



ds 



\ogmL(y, x) — — — — — 

o 1 + (A - 1 + ax s )(p 

' f T (A - 1 + ax s )x s (j) s ' 

[ogmL{y,x) / — r — ds 

Jo 1 + (A - 1 + ax s )<p s 

\ogmL(y, x) (A - 1 + ax s )x s l <j>B=0 ds 
Jo 

T Tf \ x s D s (\ogm) 

L(y, x) — —ds 

yy ! 1 + (A - 1 + ax s )<f) 

T b s {\ogm){\~l + ax s )x s (t) s 

L(y, x) — r ds 

u 1 + (A - 1 + ax s )(p 

T , Tl \ i s (X - 1 + ax s ) 

log mL{y, x) — —ds 

(l 1 + (A — 1 + ax s )(p 

T (A - I + ax s )x ST; .< 
[ogmL{y } x) I 7^ : : ; , ds 



Hx{dx) 



d_ 

da 



—-(l^^dB^^dNs-ds)). 
o 1 + (A - 1 + ax s )<p s 

Then we make use of the Malliavin integration by 
parts formula (IVII.3I) . 

Ei [log m) 



o 1 + (A - 1 + ax s 

-E \ogmL{y } x) I (A - 1 + ax s )x s l^ s=0 ds fix(da 
Jo 



n 



T 



D s (\ogm)L(y, x)x s ds 



Since Y is a mixture of Gaussian and Poisson 
processes one can show that the Malliavin derivative 
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D can be decomposed in two parts D B and D 
where D B acts on the Gaussian (Brownian) part of 
a functional of Y and where D acts on the Poisson 
part of it. Actually D B is related to the Malliavin 
derivative presented in lfT3l and D is the difference 
operator used in sections InlHVl More precisely we 
have that 

D s log(m) = Df log(m)l^ =0 + l<p s =iD s log(m). 



/ [(A - 1 + aX s ) 
Jo 

-(E[(A-l + aX a )|y])]l^=ods 

"T r 



+ E 



log(aX s + A)-log(E[aX s |y]) 



l<b a =ids 



Proof: The proof is very similar to the proof 
of Theorem IV. 3 1 We just mention that 



From relation [fT3l (19)] and Lemma IVTI we deduce 
that 



Ex[logL(y,x))nx(dx) 



H 



D s log(m) 

= (A - 1 + ax s )l^ =0 + l^ = i log ( 1 + 

leading to 

— Eipogm] 
da 



D„m 



m 



H 



T Dfm^ 
— — 1 



m 



3 =O 



L(y, x)x s ds 



+ En 



log 1 



E[(A 



711 



L(y,x)l ( i >a=1 x a ds 



Hx{dx) 



aX 8 )\y]^[x a \y]i^ a=Q ds 



a 



logE[A + aX s \y] K[ax s \y]\^ =1 ds 



We conclude this Section by the counterpart of 
Theorem IV. 3 1 in this context. 



Theorem VI.5. Assume that condition < 1 is 
satisfied then we have that 

i) 



d 



I(X;Y) 



da 1 

= -E / [(X-l + aX s )X s 

-(E[(A - 1 + aX s )\y})] HiXslyjp^ods 



= / — / (A-1 + ax s )(j) s ds 

J H JO 

+ / log(l + (A - 1 + ax s )(p s )(X + ax s )ds 
Jo 

i r T 

-- (A - 1 + axsfl^odsfixidx). 
* Jo 



Remark VI.6. • We recover the result of [4] and 
£75]/ by taking = and A = 1 in Theorem 
\VI.5\ i). Note also that when = the case 
A 7^ 1 is a bit artificial since we know that 
coefficients A and a can be replaced by a single 
parameter: the signal to noise ratio (which 
coincides with a when A = 1). 
• We recover the result of /0/ and Theorem \V.3\ 
by taking = 1 in Theorem \VI.5\ 

VII. Appendix 

In this Appendix we give some further elements 
of stochastic analysis in the framework of normal 
martingales. We use notations of Section [VTl 

Definition VII.l. Let Y be a normal martingale. 
For n > 1, let L 2 ([0,T]) on be the space of 
symmetric functions /„ in n variables. For, f n in 
L 2 ([0,T])°™ define the iterated stochastic integral 
In On) by 



In\fn 



+ i E 

a 



ii) 



iP x (aX s + \)-i; x (B[aX s \y}) 



where ip\(x) := (x — A) log(x). 



=%ds 



nl 



fnifi, ■ ■ ■ ,t n ) dY tl . . . dY tn . 



d_ 

dX 



I(X;Y) 



For f in R we let I (f ) := Jo- 
in addition we have that 

I n (fn) = n / I n -i{f n {*,t)l [0tt] n-i(*)) dY t , n > 1, 
Jo 
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where fn(*,t) denotes the elements in 
L 2 ([0, T]) on ~ l obtained by considering /„ where 
one variable is fixed at t (since f n is symmetric we 
assume that the first variable is fixed to be equal to 
t). 

Definition VII.2. 

Denote for n > 1, 

n n = {l n (f n ), f n eL 2 ([0,T]) on }. 

We say that (it)t e [o,T] has the chaos representation 
property if 



n=0 



that is, for every F in L 2 (Q) there exists (/ n )neN 
such that f n e L 2 ([0,T]) on , n > 1 and 

oo 

F = £/„(/„). 

n=0 

As an example this property is true for the mixture 
of Gaussian and Poisson processes considered in 
Section [VD 

We introduce the Malliavin derivative with 
respect to (Y t ) te[0tT] . Let 



f n 

s = \ heL 2 ([0,T}) 

^ k=0 

< k < n,n G n\. 



ok 



We define the Malliavin derivative D as the linear 
operator from S to L 2 (Q x [0,T]) by 

DMf n ) = nI n .i(f n (*,t)), dFxdt-a.e. (VII.l) 

We state the Malliavin integration by parts relative 
to the process Y. Let F in L 2 (Q) and denote, 
as above, by (f n ) n the functions appearing in its 
chaotic decomposition. Assume that 
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^nn\\\f n \\ 2 L 2 



([0,T]«) 



< OO 



71=1 



then for every deterministic h : [0, T] 
that 



B[Fh(h)} 



D s Fh{s)ds 



(VII.2) 
we have 

(VII.3) 



where E denotes the expectation relative to P. 



